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Geometry of the Restricted Boltzmann Machine 

Man'a Angelica Cueto, Jason Morton, and Bernd Sturmfels 



Abstract. The restricted Boltzmann machine is a graphical model for binary 
random variables. Based on a complete bipartite graph separating hidden and 
observed variables, it is the binary analog to the factor analysis model. We 
study this graphical model from the perspectives of algebraic statistics and 
tropical geometry, starting with the observation that its Zariski closure is a 
Hadamard power of the first secant variety of the Segre variety of projective 
lines. We derive a dimension formula for the tropicalized model, and we use it 
to show that the restricted Boltzmann machine is identifiable in many cases. 
Our methods include coding theory and geometry of linear threshold functions. 
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1. Introduction 

A primary focus in algebraic statistics is the study of statistical models that 
can be represented by polynomials in the model parameters. This class of algebraic 
statistical models includes graphical models for both Gaussian and discrete random 
variables I13j . In this article we study a family of binary graphical models with 
hidden variables. The underlying graph is the complete bipartite graph Kk^n- 
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Figure 1: Graphical representation of the restricted Boltzmann machine. 
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2 MARIA ANGELICA CUETO, JASON MORTON, AND BERND STURMFELS 

The k white nodes in the top row of Figure [ijrepresent hidden random variables. 
The n black nodes in the bottom row represent observed random variables. The 
restricted Boltzmann machine (RBM) is the undirected graphical model for binary 
random variables specified by this bipartite graph. We identify the model with its 
set of joint distributions which is a subset Af^ of the probability simplex /^2^-i- 

The graphical model for Gaussian random variables represented by Figure [l] 
is the factor analysis model, whose algebraic properties were studied in ^ I12j . 
Thus, the restricted Boltzmann machine is the binary undirected analog of factor 
analysis. Our aim here is to study this model from the perspectives of algebra and 
geometry. Unlike in the factor analysis study [12,, , an important role will now be 
played by tropical geometry |26j . This was already seen for n = A and fc = 2 in the 
solution by Cueto and Yu of the implicitization challenge in 13, Problem 7.7]. 

The restricted Boltzmann machine has been the subject of a recent resurgence 
of interest due to its role as the building block of the deep belief network. Deep 
belief networks are designed to learn feature hierarchies to automatically find high- 
level representations for high-dimensional data. A deep belief network comprises a 
stack of restricted Boltzmann machines. Given a piece of data (state of the lowest 
visible variables), each layer's most likely hidden states are treated as data for the 
next layer. A new effective training methodology for deep belief networks, which 
begins by training each layer in turn as an RBM using contrastive divergence, was 
introduced by Hinton et al. [16 . This method led to many new applications in 
general machine learning problems including object recognition and dimensionality 
reduction [17 . While promising for practical applications, the scope and basic 
properties of these statistical models have only begun to be studied. For example, 
Le Roux and Bengio ^21 showed that any distribution with support on r visible 
states may be arbitrarily well approximated provided there are at least r + \ hidden 
nodes. Therefore, any distribution can be approximated with 2" -I- 1 hidden nodes. 

The question which started this project is whether the restricted Boltzmann 
machine is identifiable. The dimension of the fully observed binary graphical model 
on Kk^n is equal to nk + n+k, the number of nodes plus the number of edges. We 
conjecture that this dimension is preserved under the projection corresponding to 
the algebraic elimination of the k hidden variables. Here is the precise statement: 

Conjecture 1.1. The restricted Boltzmann machine has the expected dimen- 
sion, i.e. is a semialgebraic set of dimension min{nk + n+ k,2"' — 1} in A2n_i. 

This conjecture is shown to be true in many special cases. In particular, it 
holds for all k when n -t- 1 is a power of 2. This is a consequence of the following: 

Theorem 1.2. The restricted Boltzmann machine has the expected dimension 
min{nfc + n + fc, 2" - 1} when k < 2"-ri°g2("+i)l and when k > 2»-Liog2("+i)J . 

We note that Theorem 11.21 covers most cases of restricted Boltzmann machines 
as used in practice, as those generally satisfy k < 2"'~riog2("+i)l , xhe case of large 
k is primarily of theoretical interest and has been studied recently in |21j . 

This paper is organized as follows. In Section 2 we introduce four geometric 
objects, namely, the RBM model, the RBM variety, the tropical RBM model, and 
the tropical RBM variety, and we formulate a strengthening of Conjecture |1.1[ 
Section 3 is concerned with the case k = 1. Here the RBM variety is the variety 
of secant lines of the Segre variety (P^)" C The general case fc > 1 arises 

from that secant variety by way of a construction we call the Hadamard product 
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of projective varieties, as shown in Proposition |2.1[ In Section 4 we analyze the 
tropical RBM model, we establish a formula for its dimension (Theorem |4.2p , and 
we draw on results from coding theory to derive Theorem |1.2| and Table [1] In 
Section 5 we study the piecewise-linear map that parameterizes the tropical RBM 
model. The inference functions of the model (in the sense of |14L I26j ) are fc-tuples 
of linear threshold functions. We discuss the number of these functions. Figure [5] 
shows the combinatorial structure of the tropical RBM model for n=3 and k=l. 

2. Algebraic Varieties, Hadamard Product and Tropicalization 

We begin with an alternative definition of the restricted Boltzmann machine. 
This "machine" is a statistical model for binary random variables where n of the 
variables are visible and k of the variables are hidden. The states of the hidden 
and visible variables are written as binary vectors h £ {0, l}*^ and v G {0, 1}" 
respectively. We introduce nk + n + k model parameters, namely, the entries of a 
real k x n matrix W and the entries of two vectors b E M" and c G M'' , and we set 

(1) i^{v,h) = exp{h'^Wv + b^v + c^h). 

The probability distribution on the visible random variables in our model equals 

(2) P{v) = ^ • E ^(«'^)' 

he{o.i}>' 

where Z = ^ '4'{v, h) is the partition function. We denote by the subset of 
the open probability simplex A2"_i consisting of all such distributions {p{v) : v £ 
{0, 1}") as the parameters W,b and c run over M''^", M" and M'^ respectively. 

In what follows we refer to as the RBM model with n visible nodes and 
k hidden nodes. It coincides with the binary graphical model associated with the 
complete bipartite graph Kk.n as described in the Introduction. This is indicated 
in Figure [T] by the labeling with the states v, h and the model parameters c, W, b. 

The parameterization in ([T]) is not polynomial because it involves the exponen- 
tial function. However, it is equivalent to the polynomial parameterization obtained 
by replacing each model parameter by its value under the exponential function: 

7i = exp(ci) , uj,j = exp{Wij) , Pj = exp{bj). 

This coordinate change translates ([T|) into the squarefree monomial 

k k n n 

i—l i—1 j — 1 J — 1 

and we see that the probabihties in Q can be factored as follows: 

(3) P{v) = ^/3r/92^---/3^n(l + 7.t^ri^^ri---^L") for z;e{0,ir. 

1=1 

The RBM model is the image of the polynomial map M"q+''^+" ~* A2^-i whose 
vth coordinate equals This shows that Af^ is a semialgebraic subset of A2"-i. 

When faced with a high-dimensional semialgebraic set arising in statistics, it is 
often useful to simplify the situation by disregarding all inequalities and by replacing 
the real numbers K by the complex numbers C. This leads us to considering the 
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Zariski closure of the RBM model M^^. This is the algebraic variety in the 
complex projective space P^"^^ parameterized by ([s]). We call the RBM variety. 

Given any two subvarieties X and y of a projective space P™, we define their 
Hadamard product X * y to be the closure of the image of the rational map 

X P™ , {x, y) {xoyo ■ xiyi : . . . : x,nym)- 

For any projective variety X , we may consider its Hadamard square X^^l = X * X 
and its higher Hadamard powers X'^'^^ — X X^''~^^. If M is a subset of the 
open simplex A^-i then its Hadamard powers are also defined by componen- 
twise multiplication followed by rescaling so that the coordinates sum to one. This 
construction is compatible with taking Zariski closures, i.e. we have MW ^ . 

In the next section we shall take a closer look at the case k = 1, and we shall 
recognize V,^ as a secant variety and as a phylogenetic model. Here, we prove 
that the case of fc > 1 hidden nodes reduces to fc = 1 using Hadamard powers. 

Proposition 2.1. The RBM variety and model factor as Hadamard powers: 

^ (y„i)W and = (M^)W. 

Proof. A strictly positive vector p with coordinates p{v) as in ([s]) admits 
a componentwise factorization into similar vectors for k — I, and, conversely, the 
componentwise product of k probability distributions in becomes a distribution 
in after division by the partition function. Hence — (M,j)['"'l in A2"-i. The 
equation V"^ = {V,l)^''^ follows by passing to the Zariski closure in P^ □ 

The emerging field of tropical mathematics is predicated on the idea that 
log(exp(x) + exp(?/)) is approximately equal to max(a:, y) when x and y are quan- 
tities of different scale. For a first introduction see |30j . and for further reading see 
m ISl IIOL 123] and references therein. The process of passing from ordinary arith- 
metic to the max-plus algebra is known as tropicalization. The same approximation 
motivates the definition of the softmax function in the neural networks literature. 
A statistical perspective is offered in work by Pachter and the third author |27l I26j . 

If q(v) approximates log(p(w)) in the sense of tropical mathematics, and if we 
disregard the global additive constant — log Z, then ([2| translates into the formula 

(4) q{v) = ma.x{h'^Wv + b^v + c'^h : he{0,l}''}. 

This expression is a piecewise-linear concave function ^ M on the space 

of model parameters {W, b, c). As v ranges over {0, 1}", there are 2" such concave 
functions, and these form the coordinates of a piecewise-linear map 

(5) $ : E"'=+"+'= ^ TP^""\ 

Here TP^ ~^ denotes the tropical projective space !,...,!), as in |4l llOj . 

The image of the map 4> is denoted TM^ and is called the tropical RBM model. 
The map <i> is the tropicalization of the given parameterization of the RBM model. 
It is our objective to investigate its geometric properties. 

This situation fits precisely into the general scheme of parametric maximum a 
posterior (MAP) inference introduced in |26j and studied in more detail by Elizalde 
and Woods |14j . In Section 5 below, we discuss the statistical relevance of the map 
<i> and we examine its geometric properties. Of particular interest are the domains 
of linearity of and how these are mapped onto the cones of the model TM^. 
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Finally, we define the tropical REM variety TV^ to be the tropicalization of 
the RBM variety V^. As explained in |27l §3.4] and |26l §3], the tropical variety 
TV^ is the intersection in TP^ ~^ of all the tropical hypersurfaces T(/) where / 
runs over all polynomials that vanish on (or on M^). By definition, T(/) is the 
union of all codimension one cones in the normal fan of the Newton polytope of 
/. If the homogeneous prime ideal of the variety Vf^ were known then the tropical 
variety TV^ could in theory be computed using the algorithms in which are 
implemented in the software Gfcoi ( |19j ). However, this prime ideal is not known 
in general. In fact, even for small instances, its computation is very hard and relies 
primarily on tropical geometry techniques such as the ones developed in [7J. For 
instance, the main result in [7] states that the RBM variety is a hypersurface of 
degree 110 in P^^, and it remains a challenge to determine a formula for the defining 
irreducible polynomial of this hypersurface. To appreciate this challenge, note that 
the number of monomials in the relevant multidegree equals 5 529 528 561 944. 

Here is a brief summary of the four geometric objects we have introduced: 

• The semialgebraic set C A2"-i of probability distributions repre- 
sented by the restricted Boltzmann machine. We call the RBM model. 

• The Zariski closure y„^ofthe RBM model M;^. This is an algebraic variety 
in the complex projective space P^ ~^ . We call the RBM variety. 

• The tropicalization TV^ of the variety V^. This is a tropical variety in the 
tropical projective space TP^ We call TV^ the tropical RBM variety. 

• The image TM^ of the tropicalized parameterization <i>. This is the sub- 
set of TP^ consisting of all optimal score value vectors in the MAP 
inference problem for the RBM. We call TM^ the tropical RBM model. 

We have inclusions C and TM^ C TV^^ . The latter inclusion is the con- 
tent of the second statement in |26L Theorem 2] . We shall see that both inclusions 
are strict even for fc = 1. For example, is a proper subset of Vg^ n A7 = A7 
since points in this set must satisfy the inequality cri2cri3(T23 > as indicated in 



Theorem 3.4 below. Likewise, TM^ is a proper subfan of TP'' = TV^ . This subfan 
will be determined in our discussion of the secondary fan structure in Example |5.2[ 
The dimensions of our four geometric objects satisfy the following chain of 
equations and inequalities: 

(6) dim(rA/,^') < dim(rv;f) dim(V;^) = dim(M,^) < min{n/c-Fn-Ffc, 2" -1}. 

Here, the tropical objects TM^ and TV^ are polyhedral fans, and by their dimen- 
sion we mean the dimension of any cone of maximal dimension in the fan. When 
speaking of the dimension of we mean the KruU dimension of the projective 
variety, and for the model we mean its dimension as a semialgebraic set. 

The leftmost inequality in ^ holds because TM,^ C TV^. The left equahty 
holds by the Bieri-Groves Theorem (cf. 110! Theorem 4.5]) which ensures that 
every irreducible variety has the same dimension as its tropicalization. The second 
equality follows from standard real algebraic geometry results because has a 
regular point and is Zariski dense in V^. Finally, the rightmost inequality in (joj) is 
seen by counting parameters in the definition ([l])-([2]) of the RBM model M^, and 
by bounding its dimension by the dimension of the ambient space l^2^-i- 

We conjecture that both of the inequalities in ^ are actually equalities: 
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Conjecture 2.2. The tropical RBM model has the expected dimension, i.e. TM^ 
is a polyhedral fan of dimension min{nA; + n + fc, 2" — 1} in TP^ . 



In light of the inequalities ([6]), Conjecture 2.2 implies Conjecture In Sec 



tion 4 we shall prove some special cases of these conjectures, including Theorem 1.2 
3. The First Secant Variety of the n-Cube 



We saw in Proposition 2.1 that the RBM for fc > 2 can be expressed as the 
Hadamard power of the RBM for fc = 1. Therefore, it is crucial to understand 
the model with one hidden node. In this section we fix fc = 1 and we present an 
analysis of that case. In particular, we shall give a combinatorial description of the 
fan TM^ which shows that it has dimension 2n + 1, as stated in Conjecture 



2.2 



We begin with a reparameterization of our model that describes it as a secant 
variety. Let A, Si,. . . ,6n, ei, . . . , e„ be real parameters which range over the open 
interval (0,1), and consider the polynomial map p : (0,1)^""'""'^ A2"-i whose 
coordinates are given by 

n n 

(7) p{v) ^ xHs'r'^il-d.r + il~X)l[el~^-{l-e,r for^;e{0,lr. 

1=1 i=l 

Proposition 3.1. The image of p coincides with the RBM model M^. 
Proof. Recall the parameterization ([3| of the RBM model from Section 2: 

(8) p{v) = |/3r/32"^---/?:"(H-7^r^2^---'^n") for e {0, 1}" . 

We define a bijection between the parameter spaces R^^q^^ and (0, l)2"+i as follows: 

/3i = ^— — ^ and uji = ^ ^\ — for i = 1, 2, . . . , n, 

7 = Z(l - A)eie2 • • • e„ where Z ^ {XS1S2 ■ ■ ■ 5n)~^ . 
This substitution is invertible and it transforms Q into Q. □ 



Proposition 3.1 shows that is the first mixture of the independence model 
for n binary random variables. In phylogenetics, it coincides with the general 
Markov model on the star tree with n leaves. A semi-algebraic characterization of 
that model follows as a special case from recent results of Zwiernik and Smith |32j . 
We shall present and discuss their characterization in Theorem |3.4| below. 

First, however, we remark that the Zariski closure of a mixture of an indepen- 
dence model is a secant variety of the corresponding Segre variety. This fact is 
well-known (see e.g. |T3l §4.1]) and is here easily seen from Q. We conclude: 

Corollary 3.2. The first RBM variety coincides with the first secant va- 
riety of the Segre embedding of the product of projective lines (P^)" into P^ , and 
the first tropical RBM variety TV^ is the tropicalization of that secant variety. 

We next describe the equations defining the first secant variety V^l . The coordi- 
nate functions p{v) are the entries of an n-dimensional table of format 2x2x • • • x2. 
For each set partition {l,2,...,n} = AUi?we can write this table as an ordinary 
two-dimensional matrix of format 21^^1x21^1, with rows indexed by {0,1}'^ and 
columns indexed by {0, l}'^. These matrices axe the flattenings of the 2x2x • • • x2- 
table. Pachter and Sturmfels |26l Conjecture 13] conjectured that the homogeneous 
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prime ideal of the projective variety C is generated by the 3 x 3-minors 

of all the flattenings of the table (p(w)))^,£{o.i}" • This conjecture has been verified 
computationally for n < 5. A more general form of this conjecture was stated in [15J 
§7]. The set-theoretic version of that general conjecture was proved by Landsberg 
and Manivel in [20, Theorem 5.1]. Their results imply: 

Theorem 3.3 (Landsberg-Manivel) . The projective variety C P^"^^ is the 
common zero set of the 3x3-minors of all the flattenings of the table )))„g{o,i}" • 

We now come to the inequalities that determine among the real points of 
V^. For any pair of indices i,j € {1, 2, . . . , n} we write aij for the covariance of the 
two random variables Xi and Xj obtained by marginalizing the distribution, and 
we write S = {(Tij) for the n x n-covariance matrix. We regard S as a polynomial 

map from the simplex A2n_i to the space r( ^ ) of symmetric nxn-matrices. The 
off-diagonal entries of the covariance matrix S are the 2x2-minors obtained by 
marginalization from the table (p(v)). For example, for n = 4 the covariances are 



(712 = det 



f Poooo+Poooi+Pooio+Pooii Poioo+Poioi+Poiio+Poiii 

VPlOOO+PlOOl+PlOlO+Ploll PllOO+PllOl+Plllo+Piiii 



(Ti3 — det ( ™^ Pooio+Pooii+Poiio+Poiii j 



Zwiernik and Smith |32j gave a semi-algebraic characterization of the general 
Markov model on a trivalent phylogenetic tree in terms of covariances and moments. 
The statement of their characterization is somewhat complicated, so we only state 
a weaker necessary condition rather than the full characterization. Specifically, 
applying |32l Theorem 7] to the star tree on n leaves implies the following result. 

Corollary 3.4. // a probability distribution p e A2"-i lies in the first RBM 



model then all its matrix flattenings (as in Theorem 3.S) have rank < 2 and 
(7ij<yik<^jk > for all distinct triples i, j, k d {1,2, . . . ,n}. 
These inequalities follow easily from the parameterization (|8]), which yields 

CTy = A(l - X){S, ~ ei){Sj - ej) ■ 

lls = l "s lls = l 

This factorization also shows that the binomial relations UijUki — (TuCTjk hold on 
M}^. These same binomial relations are valid for the covariances in factor analysis 
|12l Theorem 16], thus further underlining the analogies between the Gaussian 
case and the binary case. Theorem 20 in [32' extends the covariance equations 
CijCTfe; = auajk to a collection of quadratic binomial equations in all tree-cumulants, 
which in turn can be expressed in terms of higher order correlations. For the star 
tree, these equations are equivalent on A2"~i to the rank < 2 constraints. However, 
for general tree models, the binomial equations in the tree-cumulants are necessary 
conditions for distributions to lie in these models. 

We now turn to the tropical versions of the RBM model for k = 1. The variety 
is cut out by the 3x3-minors of all flattenings of the table {p{v))^^^^ It is 
known that the 3 x 3-minors of one fixed two-dimensional matrix form a tropical 
basis (cf. [H §2]). Indeed, that statement is equivalent to |9j Theorem 6.5]. It is 
natural to ask whether the tropical basis property continues to hold for the set of all 
3 x 3-determinants in Theorem |3.3| Since each flattening of our table corresponds 
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to a non-trivial edge split of a tree on n taxa (i.e. a partition of the set of taxa into 
two sets each of cardinality > 2), our question can be reformulated as follows: 

Question 3.5. Is the tropical RBM variety TV^ equal to the intersection of 
the tropical rank 2 varieties associated to non-trivial edge splits on a collection of 
trees on n taxa? 

The tropical rank two varieties associated to each of the edge splits have been 
studied recently by Markwig and Yu [23 . They endow this determinantal variety 
with a simplicial fan structure that has the virtue of being shellable. The cones 
of this simplicial fan correspond to weighted bicolored trees on 2"~^ taxa with no 
monochromatic cherries. The points in a cone can be viewed as a matrix encoding 
the distances between leaves with different colors in the weighted bicolored tree. 

Question 3.5 is void for n < 3, so the first relevant case concerns n = A taxa. 
We were surprised to learn that the answer is negative already in this case: 

Example 3.6. The prime ideal of the variety is generated by the 3x3- 
minors of the three flattenings of the 2x2x2x2-table p. As a statistical model, 
each one of the three flattenings corresponds to the graphical model associated to 
each one of the quartet trees (12|34), (13|24) and (14|23), as depicted in Figure |2] 




1 2 3 

(a) (12|34) 




3 2 
(b) (13124) 




4 2 3 

(c) (14|23) 



Figure 2: Quartet trees associated to the flattenings for n — A. 



Algebraically, each flattening corresponds to the variety cut out by the 3x3- 
minors of a 4 x 4-matrix of unknowns. These minors form a tropical basis. The trop- 
ical variety they define is a pure fan of dimension 11 in TP^^ with a 6-dimensional 
lineality space. The simplicial fan structure on this variety given by [23, has the 
/-vector (98,1152,4248,6072,2952). Combinatorially, this object is a shellable 4- 
dimensional simplicial complex which is the bouquet of 73 spheres. However, this 
determinantal variety admits a different fan structure, induced from the Grobner 
fan as in |4], or from the fact that the sixteen 3 x 3-minors form a tropical basis. 
Its /-vector is (50,360,1128,1680,936). 

The tropical variety TVl is a pure fan of dimension 9 in TP^^. Its lineality space 
has dimension 4, and the cones of various dimensions are tallied in its /-vector 

f{TVl) = (382, 3436, 11236, 15640, 7680) . 



Question 3.5 asks whether the 9-dimensional tropical variety TVl is the intersec- 
tion of the three 11-dimensional tropical determinantal varieties associated with the 
three trees in Figure [2] The answer is "no" . Using the software Gf an jl9] , we com- 
puted the tropical prevariety cut out by the union of all forty-eight 3 x 3-minors. The 
output is a non-pure polyhedral fan of dimension 10 with a 4-dimensional lineality 
space (the same one as of TVl), having /-vector (298, 2732, 9440, 13992, 7304, 96). 
The tropical variety TV^ is a triangulation of a proper subfan, and each of the 
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96 lO-dimcnsional maximal cones lies in the prevariety but not in the variety. An 
example of a vector in the relative interior of a maximal cone is 

q = (59,1,80,86,102,108,107,113,109,115,100,106,78,84,21,43). 

(Here, coordinates are indexed in lexicographic order poooOiPoooi: ■ • ■ Given 
the weights q, the initial form of each 3x3- minor of each flattening is a binomial, 
however, the initial form of the following polynomial in the ideal of Vl is the 
underlined monomial: 

PooooPoiioPioioPiioi ^ PooioPoiooPioooPiiii + PooioPoiooPiooiPiiio 
—PooooPoiioPiooiPiiio — PoooiPoiioPioioPiioo + PooooPooioPiiooPiiii 
^PooooPooioPiioiPiiio + PoooiPoiioPioooPiiio- 

Anders Jensen performed another computation, using Gf an and SoPlex [31| . which 
verified that we get a tropical basis by augmenting the 3x3-minors with the above 
quartic and its images under the symmetry group of the 4-cube. This is a non-trivial 
computation because the corresponding fan structure on TVl has the /-vector 

(37442, 321596, 843312, 880488, 321552). 

Using the language of [9], we may conclude from our computational results that 
the notions of tropical rank and Kapranov rank disagree for 2x2x2x2-tensors. □ 

Last but not least, we examine the tropical model TM^. This is a proper subfan 
of the tropical variety TV^, namely, TM^ is the image of the tropical morphism 
<j) . ^^n+i _^ ']rp2"-i which is the specialization of ([5]) for /c = 1. Equivalently, <i> is 
the tropicalization of the map ([8| , and its coordinates are written explicitly as 

(9) q{v) — b^v + max{ , cju + c}. 

This concave function is the maximum of two linear functions. The 2n + 1 pa- 
rameters are given by a column vector b G K", a row vector oj € M", and a scalar 
c € M. A different - but entirely equivalent - tropicalization can be derived from 
(|7|. As u ranges over {0, 1}", there are 2" such concave functions, and these form 
the coordinates of the tropical morphism <I>. We note that <I> made its first explicit 
appearance in |26l Equation (10)], where it was discussed in the context of ancestral 
reconstruction in statistical phylogenetics. Subsequently, Develin [8] and Draisma 
|10( §7.2] introduced a tropical approach to secant varieties of toric varieties, and 
our model fits well into the context developed by these two authors. 

Remark 3.7. The first tropical RBM model TM^ is the image of the tropical 
secant map for the Segre variety (P^)" in the sense of Develin [8] and Draisma |10j . 
The linear space for their constructions has basis {J2ae{o 1}" q =1 : « = 1, • ■ • , n}, 
and the underlying point configuration consists of the vertices of the n-cube. 

In light of Example |3.6| it makes sense to say that the 2 x • • • x 2-tensors in the 
tropical variety TV,^ are precisely those that have Kapranov (tensor) rank < 2. 
This would be consistent with the results and nomenclature in jSj |9]. A proper 
subset of the tensors of Kapranov rank < 2 are those that have Barvinok (tensor) 
rank < 2. These are precisely the points in the first tropical RBM model TM^. 

We close this section by showing that TM^^ has the expected dimension: 

Proposition 3.8. The dimension of the tropical RBM model TM^ is 2n + 1. 
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Proof. Each region of linearity of the map <i> is defined by a partition C of 
{0, 1}" into two disjoint subsets C~ and C'^ , according to the condition luv + c < 
or Lov + c > 0. Thus, the corresponding region is an open convex polyhedral cone, 
possibly empty, in the parameter space ]R^"+^. It consists of all triples (6, w,c) 
such that LOV + c < for v e and lov + c > for v G Assuming n > 3, 
we can choose a partition C of {0, 1}" such that this cone is non-empty and both 
C~ and affinely span M". The image of the cone under the map <i> spans 
a space isomorphic to the direct sum of the images of 6 i-^- {h^ v : u e C) and 
(w, c) I— > {lov+c : v G C+). Hence this image has dimension 2n+l, as expected. □ 

An illustration of the proof of Proposition |3.8| is given in Figure [3j The tech- 
nique of partitioning the vertices of the cube will be essential in our dimension 
computations for general k in the next section. In Section 5 we return to the small 
models TAd^ and take a closer look at their geometric and statistical properties. 





Figure 3: Partitions of {0, 1}'^ that define non-empty cones on which <i> is linear. 
Here C+ and are indicated by black (•) and white (o) vertices of the 3-cube. 
The slicing on the right represents a cone in the parameter space whose image under 
<i> is full-dimensional, while the one on the left does not. 



4. The Tropical Model and its Dimension 

This section is concerned with Conjecture |2.2| which states that the tropical 
RBM model has the expected dimension. Namely, our aim is to show that 

2" — 1 — n 

dim(rA£^') ^ kn + k + n for k < . 

^ " ^ ~ n+1 

For fc = 1 this is Proposition |3.8[ and we now consider the general case k > 2. Our 
main tool towards this goal is the dimension formula in Theorem |4.2| below. As in 
the previous section, we study the regions of linearity of the tropical morphism <i>. 

Let A denote the matrix of format 2" x n whose rows are the vectors in {0, 1}". 
A subset C of the vertices of the n-cube is a slicing if there exists a hyperplane that 
has the vertices in C on the positive side and the remaining vertices of the n-cube 
on the other side. In the notation in the proof of Proposition |3.8| the subset C was 
denoted by C+ . Two examples of slicings for ?i = 3 are shown in Figure [3] 
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For any slicing C of the n-cube, let Ac be the 2" x (n+l)-matrix whose rows 
V indexed by the vertices in C are (l,u) € {0, 1}"+^ and whose other rows are all 
identically zero. The following result extends the argument used for Proposition |3. 8 [ 



Lemma 4.1. On each region of linearity, the tropical morphism $ in ^ coin- 
cides with the linear map represented by a 2" x {nk + n + k) -matrix of the form 

A = {A\Ac,\Ac,\--- \Ac,), 

for some slicings Ci, C2, . . . , Cfc of the n-cube. 

Proof. The tropical map $ : R^^k+n+k _^ ']rp2"-i ^an be written as follows: 

^W,b,c) = (^^max J/.T(W^t; + c),0} + )^^^„_^j„. 

Consider a parameter vector 9 with coordinates 

9 := {bi,b2, ■ ■ ■ :bn, ci, wn, . . . ,wi„, 02,^21, ■ ■ ■ ,^^271, ■ ■ ■ , Ck,^ki, ■ ■ ■ , Wfe„). 

We associate to this vector the k hyperplanes Hi{9) = {w e M" : uJnVi + . . . + 
ijJinVn + Ci = 0} for i = 1,2, ... ,k. Let us assume that 9 is chosen generically. Then, 
for each index i, we have {0, 1}" n Hi{9) — 0, and we obtain a slicing of the n-cube 
with Ci{9) := { u e {0, 1}" : J2]j=i ^ij'^j+Ci > }. The generic parameter vector 6 
lies in a unique open region of linearity of the tropical morphism More precisely, 
this region corresponds to the cone of all 9' in such that Ci{9) — Ci{9') for 

i = 1,2, ... ,k. By construction, the map <f> : ^2" linear on this cone. 

Following the definition of $ we see that this linear map is just left multiplication 
of the vector by a matrix whose rows are indexed by the observed states v and 
columns indexed by the coordinates of 9. This matrix is precisely the matrix A 
above, where Ci = Ci{9) for i = 1,2, . . . , k. The result follows by continuity of the 
map <i>. □ 



As an immediate consequence of Lemma |4.1| we obtain the following result: 

Theorem 4.2. The dimension of the tropical RBM model TM^ equals the 
maximum rank of any matrix of size 2" x (nk + n + fc) of the form 

A= {A\Ac,\Ac,\--- \Ac,), 
where {Ci, C2, . . . , C^} is any set of k slicings of the n-cube. 



Theorem 4^ furnishes a tool to attack Conjecture |2.2[ What remains is the 
combinatorial problem of finding a suitable collection of slicings of the n-cube. In 
what follows we shall apply existing results from coding theory to this problem. 

There are two quantities from the coding theory literature [2j |5l |6j [18] that 
are of interest to us. The first one is A2{n,3), the size (number of codewords) of 
the largest binary code on n bits with each pair of codewords at least Hamming 
distance (number of bit flips) 3 apart. The second one is i^2(n, 1), the size of the 
smallest covering code on n bits. In other words, K2{n, 1) is the least number of 
codewords such that every string of n bits lies within Hamming distance one of 
some codeword. We obtain: 

Corollary 4.3. The dimension of the tropical RBM model satisfies 

• dimTM^J = nk + n + k for k < A2{n,3), 

• dim TM;J = min{nfc -\- n -\- k,2" - 1} for k ^ A2{n, 3), 

• dim TM;J = 2" - 1 for k> K2{n, 1). 
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Proof. For the first statement, let k < A2{n, 3) — 1 and fix a code with min- 
imum distance > 3. For each codeword let Cj denote its Hamming neighborhood, 
that is, the codeword together with all strings that are at Hamming distance 1. 
These k — 1 sets Cj , together with some Hamming neighborhood in the complement 
of their union, are pairwise disjoint, and each of them corresponds to a a slicing of 
the cube as in Theorem |4.2[ The disjointness of the k neighborhoods means that 
nk + n + k < 2"' — I. Elementary row and column operations can now be used to 
see that the corresponding 2" x (nk + n+k) matrix A = {A\Ac\ \ ■ ■ ■ \Ac^.) has rank 
nk + n + k. This is because, after such operations, A consists of a block of format 
n X n and k blocks of format (n + 1) x (n + 1) along the diagonal. The first block 
has rank n and the remaining k blocks have rank n + I each. The same reasoning 
is valid for k = A2{n, 3) except that it may now happen that nfc + fc + n > 2". In 
this case, the k blocks have total rank k{n + 1) and together with the first n x n 
block they give a matrix of maximal rank miii{nk + n + fc, 2" — 1}. 

For the third statement, we suppose Ci, . . . , C'k are slicings with subslicings 
C'i ^ Ci such that the C'^ are disjoint and no n + 1 of the vertices in a given Ci 
lie in a hyperplane. Then rank(^) > n + X]i=i l^il t>y similar arguments. This is 
because we may construct the C'i by pruning neighbors from codewords, and are 
left with a lower-dimensional Hamming neighborhood which is a slicing. □ 

The computation of A2{n,3) and K2{n,l), both in general and for specific 
values of n, has been an active area of research since the 1950s. In Table [T] we 
summarize some of the known results for specific values of n. This table is based 
on [5, 22" . For general values of n, the following bounds can be obtained. 

Proposition 4.4. For binary codes with n > 3, the Varshamov bound holds: 



For n = 2^ — 1 with £ > 3, we have the equality ^2(71, 3) — K2{n, 1) = 2^ ^ ^ . 

Proof. A proof of the Varshamov bound on A2{n,3) may be found in |18j . 
The last statement holds because A2{n, 3) — K2{n, 1) for perfect Hamming codes: 
for every £ > 3 there is a perfect (2^ — 1, 2^ — — 1, 3) Hamming code (i.e. a perfect 
Hamming code on 2^ — 1 bit, of size 2^ — ^—1, and with Hamming distance 3). For 
a proof of this result, see [6;. Additionally, we have K2{2"^ — 1,1) = 2'^"^^"^^^ for 
m > 3; see [5]. 

The simple upper bound on K2{n, 1) can be obtained by using overlapping 
copies of the next smallest Hamming code. Suppose n ^ 2^ — 1 for any i.e. n 
is between Hamming integer numbers (i.e. integers of the form 2^ — 1). Let n be 
the next smallest Hamming integer n, with £ = [log2(?^ + l)J,sor7, = 2^ — 1. The 
number of hidden nodes needed to cover the n-cube is exactly K{n, 1) = 2^ 
We may use the n codes to cover each of the 2"~- faces of the rt-cube with 2- 
vertices, although we will have overlaps. That is. 



A2{n,3) > 2"^riog2("+i)l 



For covering codes, the following inequality holds: 



K2{n,l) < 2"-Liog2("+i)J 



(10) K2{n,l) < K2{n,l)-2 



,n 



n 



Taking log2 in the inequality (10 1, we obtain 



l0g2i^2(n,l) < l0g2(i^2(li,l)2"-=) 



= n 



Ll0g2(jl+l)J. 
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o 



Special cases where Conjecture 2.2 holds, based on [5l [22] and Corollary 



4.5 



Bold entries sho w im provements made by various researchers on the bounds 
For example, for n — 19, TAI^^ has the expected dimen- 
1 if fc > 31744, while the Corollary 
4.5 bounds are 2" = 16384 and 2^5 = 32768, respectively. The k < column hsts 



provided by Corollary 

sion if fc < 2^2 . 5 ^ 20480 and dimension 2" 



lower bounds on ^2(71, 3) while the fc > column lists upper bounds on K2{n, 1). 



This implies i^2(ri,l) < 2"-Liog2(«+i)J . □ 

Our method results in the following upper and lower bounds for arbitrary values 
of n. Note that the bound is tight if n + 1 is a power of 2. Otherwise there might 
be a multiplicative gap of up to 2 between the lower and upper bound. In addition 
to these general bounds, we have the specific results recorded in Table [l] 

Corollary 4.5. The coding theory argument leads to the following bounds: 

• If k < 2"-ri°g2(»+i)l, then dimTM^ = nfc + n + fc. 

• If k = 2"-riog2(»+i)l ^ then dimTA/^ = min{nfc + n + fc, 2" - 1}. 

• If k> 2"-Li°g2(»+i)J^ then dimTAf^ = 2" - 1. 
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Proof of Theorem 11.21 This is now easily completed by combining Corol- 



lary 4.5 with the inequalities in m[. □ 



We close this section with the remark that the use of Hamming codes is a stan- 
dard tool in the study of dimensions of secant varieties. We learned this technique 
from Tony Geramita. For a review of the relevant literature see |10j . It is important 
to note that, in spite of the combinatorial similarities, the varieties we study here 
are different and more complicated than higher secant varieties of Segre varieties. 



5. Polyhedral Geometry of Parametric Inference 

The tropical model TM^ is not just a convenient tool for estimating the di- 
mension of the statistical model M^. It is also of interest as the geometric object 
that organizes the space of inference functions which the model can compute. This 
statistical interpretation of tropical spaces was introduced in 26 and further de- 
veloped in ^14t i27j . We shall now discuss this perspective for the REM model. 

Given an RBM model with fixed parameters learned by some estimation pro- 
cedure and an observed state v, we want to infer which value h of the hidden data 
maximizes Prob(/i | v). The inferred string h might be used in classification or 
as the input data for another RBM in a deep architecture. Such a vector of hid- 
den states is called an explanation of the observation v. Each choice of parameters 
9 = (&, W, c) defines an inference function Ig sending v ^ h. The value l9{v) equals 
the hidden string h € {0, 1}*^ that attains the maximum in the tropical polynomial 

(11) max {h^Wv + c^h + b^v} ^ b^v + max {h^Wv + c^h}. 

In order for the inference function Ig to be well-defined, it is necessary (and 
sufficient) that 9 — {b, W, c) lies in an open cone of linearity of the tropical mor- 



phism In that case, the maximum in equation (111 is attained for a unique value 
of h. That h can be recovered from the expression of <f> as we vary the parame- 
ters in the fixed cone of linearity. Thus, the inference functions are in one-to-one 
correspondence with the regions of linearity of the tropical morphism <i>. 

The RBM model grew out of work on artificial neurons modeled as linear thresh- 
old functions (24], 128] , and we pause our geometric discussion to make a few remarks 
about these functions and the types of inference functions that our model can repre- 
sent. A linear threshold function is a function {0, 1}" {0, 1} defined by choosing 
a weight vector lo and a target weight tt. For any point v G {0, 1}" we compute 
the value tov, we test if this quantity is at most tt or no, and we assign value 
or 1 to w depending on tt > tov or tt < ujv. The weights lu,t: define a hyperplane 
in K" such that the vertices of the n-cube lie on the "true" or "false" side of the 
hyperplane. Using the linear threshold functions, we construct a /c-valued function 
{0, 1}" — > {0, 1}'^ where we replace the weight vector ui by a, kxn matrix W and the 
target weight tt by a vector tt e M''. More precisely, the function assigns a vertex 
of the fc-cube where the z-th coordinate equals if {Wv)i > tt^ and 1 if not. Our 
discussion of slicings of the n-cube in Section 4 implies the following observation: 

Proposition 5.1. The inference functions for the restricted Boltzmann ma- 
chine model are precisely those Boolean functions {0, 1}" — > {0, l}'^ for which 
each of the k coordinate functions {0, 1}" {0, 1} is a linear threshold function. 
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Most Boolean functions arc not linear threshold functions, that is, are not 
inference functions for the model Af,j. For example, the parity function cannot be 
so represented. To be precise, while the number of all Boolean functions is 2^", it 
is known [25j that for n > 8 the number X{n) of linear threshold functions satisfies 

2(2)+i6 < x{n) < 2"'. 

The exact number A(n) of linear threshold functions has been computed for up to 
n = 8. The On-Line Encyclopedia of Integer Sequences |29l A000609] reveals 

(12) A(l . . . 8) = 4, 14, 104, 1882, 94572, 15028134, 8378070864, 17561539552946. 

Combining k such functions for A: > 2 yields X{n)^ — 2®(*''"^) possible inference 
functions for the RBM model M^. This number grows exponentially in the number 
of model parameters. This is consistent with the result of Elizalde and Woods in 
|14j which states that the number of inference functions of a graphical model grows 
polynomially in the size of the graph when the number of parameters is fixed. 

In typical implementations of RBMs using IEEE 754 doubles, the size in bits 
of the representation is 64(nfc + n + A:). Thus the number 2^^'^" ^ of inference func- 
tions representable by a theoretical RBM will eventually outstrip the number 
264(nfe+n+fc) representable in a fixed-precision implementation; for example with 
k = 100 hidden nodes, this happens at n > 132. As a result, the size of the regions 
of linearity will shrink to single points in floating point representation. This is one 
possible contributor to the difficulties that have been encountered in scaling RBMs. 

The tropical point of view allows us to organize the geometric information of 
the space of inference functions into the tropical model TM^, which can then be 
analyzed with the tools of tropical and polyhedral geometry. We now describe this 
geometry in the case fc = 1. Geometrically, we can think of the linear threshold 
functions as corresponding to the vertices of the {n+ l)-dimensional zonotope cor- 
responding to the n-cube. This zonotope is the Minkowski sum in ]R"+^ of the 2" 
line segments [(1,0), (Ijw)] where v ranges over the set {0, 1}". 

The quantity A(n) is the number of vertices of these zonotopes, and their facet 
numbers were computed by Aichholzer and Aurenhammer [H Table 2]. They are 

(13) 4, 12, 40, 280, 6508, 504868, 142686416, 172493511216, . . . 



For example, the second entry in (12 1 and (13 1 refers to a 3-dimensional zonotope 



known as the rhombic dodecahedron, which has 12 facets and A(2) = 14 vertices. 



Likewise, the third entry in ( [12| and ( 13 ) refers to a 4-dimensional zonotope with 40 
facets and A(3) = 104 vertices. The normal fan of that zonotope is an arrangement 
of eight hyperplanes, indexed by {0, 1}"^, which partitions into 104 open convex 
polyhedral cones. That partition lifts to a partition of the parameter space W 
for Mg whose cones are precisely the regions on which the tropical morphism <i> 
is linear. The image of that morphism is the first non-trivial tropical RBM model 
TM3. This model has the expected dimension 7 and it happens to be a pure fan. 

Example 5.2. The tropical RBM model TMg is a 7-dimensional fan whose 
lineality space is 3-dimensional. It is a subfan of the secondary fan of the 3-cube 
[HI Corollary 2.2]. The secondary fan of the 3-cube can be represented as a 3- 
dimensional polyhedral sphere with /-vector (22,100,152,74). The 74 facets of 
that 3-sphere correspond to triangulations of the 3-cube. The tropical model TM3 
consists of all regular subdivisions of the 3-cube with two regions covering all eight 
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(a) Dl357 (b) Vs (c) 

Figure 4: Subdivisions of the 3-cube that represent vertices and facets of TM3 




Figure 5: The tropical model TM^ is glued from four triangulated bipyramids. In 
this octahedron graph, each of the bipyramids is represented by a shaded triangle. 



vertices. It sits inside the polyhedral 3-sphere as a simplicial subcomplex with /- 
vector (14, 40, 36, 12). Its 12 facets (tetrahedra) correspond to a single triangulation 
type of the 3-cube as depicted in Figure 4c The 14 vertices of TAfg come in two 



families: six vertices Dj corresponding to diagonal cuts, as in Figure 4a and eight 
vertices Vi representing corner cuts, as in Figure 4b The edges come in three 
families: four edges ViVj corresponding to pairs of corner cuts at antipodal vertices 
of the cube, twenty-four edges ViDj, and twelve edges DiDj. Finally, of the four 
possible triangles, only two types are present: the ones with two vertices of different 
type. Thus, they are 12 triangles ViVjDk and 24 triangles ViDjDk- 

Figure [5] depicts the simplicial complex TM3 which is pure of dimension 3. The 
six vertices Di and the twelve edges DjDk form the edge graph of an octahedron. 
The four nodes interior to the shaded triangles represent pairs of vertices Vi that 
are joined by an edge. Each of the shaded triangles represents three tetrahedra 
that are glued together along a common edge ViVj. Thus the twelve tetrahedra in 
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TAfg come as four triangulated bypiramids. The four bypiramids are then glued 
into four of the triangles in the octrahedron graph. Our analysis shows that the 
complex TMg has reduced homology concentrated in degree 1 and it has rank 3. □ 

The previous example is based on the fact that the image of the tropical map 
((> : ^ is a subfan of the secondary fan of the n-cube. However, it is 

important to note that $ is not a morphism of fans with respect to the natural fan 
structure on the parameter space given by the slicings of the n-cube. 

Example 5.3. Consider the case n — 2. Here equals with its secondary 
fan structure coming from the two triangulations of the square. Modulo lineality, 
this fan is simply the standard fan structure {M<o, {0}, M>o} on the real line. The 
fan structure on the parameter space M'' has 14 maximal cones. Modulo lineality, 
this is the normal fan of the rhombic dodecahedron, i.e. a partition of M.^ into 14 
open convex cones by an arrangement of four planes through the origin. Ten of 
these 14 open cones are mapped onto cones, namely, four are mapped onto M<o, two 
are mapped onto {0}, and four onto M>o. The remaining four cones are mapped 
onto M}, so (f> does not respect the fan structures relative to these four cones. 

The situation is analogous for n = 3 but more complicated. The tropical map 
$ is injective on precisely eight of the 104 maximal cones in the parameter space. 
These eight cones are the slicings shown on Figure |4a] The map $ is injective 
on such a cone, but the cone is divided into three subcones by the secondary fan 
structure on Mg. The resulting 24 = 3 • 8 maximal cells in the parameter space 
are mapped in a 2-to-l fashion onto the 12 tetrahedra in Figure [s] It would be 
worthwhile to study the combinatorics of the graph of $ for rt > 3. □ 
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